Download Rumbator: a Flamenco Rumba Cover Version Generator Based on Audio Processing at Note Level
In this article, a scheme to automatically generate polyphonic flamenco rumba versions from monophonic melodies is presented. Firstly, we provide an analysis about the parameters that defines the flamenco rumba, and then, we propose a method for transforming a generic monophonic audio signal into such a style. Our method firstly transcribes the monophonic audio signal into a symbolic representation, and then a set of note-level audio transformations based on music theory is applied to the monophonic audio signal in order to transform it to the polyphonic flamenco rumba style. Some audio examples of this transformation software are also provided.
Download Relative Music Loudness Estimation Using Temporal Convolutional Networks and a CNN Feature Extraction Front-End
Relative music loudness estimation is a MIR task that consists in dividing audio in segments of three classes: Foreground Music, Background Music and No Music. Given the temporal correlation of music, in this work we approach the task using a type of network with the ability to model temporal context: the Temporal Convolutional Network (TCN). We propose two architectures: a TCN, and a novel architecture resulting from the combination of a TCN with a Convolutional Neural Network (CNN) front-end. We name this new architecture CNN-TCN. We expect the CNN front-end to work as a feature extraction strategy to achieve a more efficient usage of the network’s parameters. We use the OpenBMAT dataset to train and test 40 TCN and 80 CNN-TCN models with two grid searches over a set of hyper-parameters. We compare our models with the two best algorithms submitted to the tasks of music detection and relative music loudness estimation in MIREX 2019. All our models outperform the MIREX algorithms even when using a lower number of parameters. The CNN-TCN emerges as the best architecture as all its models outperform all TCN models. We show that adding a CNN front-end to a TCN can actually reduce the number of parameters of the network while improving performance. The CNN front-end effectively works as a feature extractor producing consistent patterns that identify different combinations of music and non-music sounds and also helps in producing a smoother output in comparison to the TCN models.